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Whenever possible, the efficacy of a new treatment, such as a drug or behavioral intervention, is 
investigated by randomly assigning some individuals to a treatment condition and others to a control 
condition, and comparing the outcomes between the two groups. Often, when the treatment aims to 
slow an infectious disease, groups or clusters of individuals are assigned en masse to each treatment 
arm. The structure of interactions within and between clusters can reduce the power of the trial, i.e. 
the probability of correctly detecting a real treatment effect. We investigate the relationships among 
power, within-cluster structure, between-cluster mixing, and infectivity by simulating an infectious 
process on a collection of clusters. We demonstrate that current power calculations may be conser¬ 
vative for low levels of between-cluster mixing, but failing to account for moderate or high amounts 
can result in severely underpowered studies. Power also depends on within-cluster network structure 
for certain kinds of infectious spreading. Infections that spread opportunistically through very highly 
connected individuals have unpredictable infectious breakouts, which makes it harder to distinguish 
between random variation and real treatment effects. Our approach can be used before conducting a 
trial to assess power using network information if it is available, and we demonstrate how empirical 
data can inform the extent of between-cluster mixing. 
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Introduction 


In order to determine how effective a treatment is, it is common to randomly assign test subjects to different treatment 
arms. In one arm, subjects receive the experimental treatment, and subjects in the other arm receive usual care or 
a placebo. Randomization helps to ensure that the treatment is the cause of any difference in outcomes between 
the subjects in the two treatment arms, as opposed to some pre-treatment characteristics of the individuals. If 
the treatment is effective, the probability that a trial will find a statistically significant difference attributed to the 
treatment is called the power of the trial®. Adequate power requires a sufficiently large number of subjects to be 
tested, which can be expensive or infeasible. Underpowered studies are not only less likely to find a true relationship 
if one exists, but they are also more likely to erroneously conclude that an effect exists when it does not®^. In order 
to control the probability of these errors, it is important to be able to accurately assess power before conducting a 
study. 

When designing a randomized trial, we may not want or be able to randomly assign individuals to treatment. 
Individuals may be members of a cluster with complex interactions, which makes it infeasible or unethical to assign 
some individuals within a cluster to treatment and others to control. For example, the spread of HIV from infected 
to uninfected individuals in a small village might be slowed by offering its members information about safer sexual 
practices. In this case, it may be difficult or unethical to keep treated individuals’ sex partners from sharing 
information or resources. We may instead choose to randomly select villages to participate in this regime, where 
villages correspond to naturally occurring clusters, and to compare HIV infection rates between treatment and control 
villages. This type of experiment is called a Cluster Randomized Trial (CRT)®IS|61 

The correlation in outcomes of individuals within a cluster (e.g. HIV infection statuses) is known to reduce the 
power of a trial®. This correlation is generally summarized by a single parameter, called the Intracluster Correlation 
Coefficient (ICC)®, which is the average pairwise correlation of outcomes within clusters. This measure assumes 
that the correlation in outcomes for any two individuals within a cluster is identical. However, the structure of 
relationships within a cluster can be heterogeneous, and power may depend on that structure, which is not captured 
by the ICC. Usually, this structure is either ignored® or analysis is performed using methods that allow it to be 
left unspecified®. Furthermore, individuals are often likely to interact with others not only in the same cluster but 
also in other clusters, which can reduce the difference in outcomes between treated and untreated clusters, thereby 
decreasing powei^. For example, economic ties may exist between villages, the residents of which might then share 
information related to the treatment. If the treatment succeeds in slowing the infection rate in the treatment cluster, 
mixing between clusters will decrease the difference between outcomes in mixed clusters, so the power to detect a 
treatment effect will decrease and the probability of a false discovery will increase. This must be addressed either by 
adding more clusters to the trial or increasing cluster sizes, both of which could be difficult and costly. This issue is 
also often left unaddressed ^^ 
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The effect of within-cluster structure and between-cluster mixing may depend on the type of infection spreading 
through each cluster. For example, a highly contagious infectious disease like the flu can spread more efficiently 
through more highly connected individual^. Other infectious diseases, such as a sexually transmitted disease, can 
only be transmitted to one person at a time, no matter how many partners one has. The number of individuals whom 
an infected person may infect at a given time is the person’s infectivity. This quantity likely differs from person to 
person, and it depends crucially on the transmission dynamics of the disease. 

In this paper, we study, via simulation, the effect of within-cluster structure, the extent of between-cluster mixing, and 
infectivity on statistical power in CRTs. We simulate the spread of an infectious process and investigate how power 
is affected by features of the process. Specifically, we consider two infections with different infectivities spreading 
through a collection of clusters. We use a matched-pairs design^ wherein clusters in the study are paired, and each pair 
has one cluster assigned to treatment one to control!^. We model the complex within-cluster correlation structure as 
a network in which edges represent possible transmission pathways between two individuals, comparing results across 
three different well-known network models. We introduce a single parameter 7 that summarizes the extent of mixing 
between the two clusters comprising each cluster pair. This approach departs from standard power calculations 
for CRTs, in which the researcher applies a formula that determines the required sample size as a function of the 
number and size of clusters, the ICC, and the effect size^^. Figure depicts the different assumptions behind these 
two approaches. We show that our measure of mixing between clusters can have a strong effect on experimental 
power, or the probability of correctly detecting a real treatment effect. We also show that within-cluster structure 
can affect power for certain kinds of infectivity. We contrast this method to standard power calculations. We end by 
demonstrating how to assess between-cluster mixing before designing a hypothetical CRT, using a network dataset 
of inter-regional cell phone calls. 
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Figure 1 : A schematic comparing the Intracluster Correlation Coefficient (ICC) approach to the design of this study. 
Each panel shows a cluster pair, and each enclosure represents a cluster. Panel a depicts cluster pair outcomes 
(circle colors) which are correlated (gray shading) within each cluster according to the ICC. In contrast, Panel b 
shows specific relationships (contact network ties) among individuals both within and between the two clusters, and 
outcomes among them will depend on an infection spreading only through these ties. We show that modeling both 
contact network structure and the spreading process explicitly rather than modeling correlations across outcomes 
results in new findings about power in CRTs. 


Methods 

We simulate both within-cluster structure and between-cluster mixing using network models. We simulate pairs of 
clusters with each cluster in each pair initially generated as a stand-alone network. We examine the Erdos-Renyi 
(ER)li^, Barabasi-Albert (BA)li 3 , and stochastic blockmodel (SBM)li^ random networks, and we simulate 2 C clusters 
comprised of n nodes each. In order to explicitly allow for between-cluster mixing, we define a between-cluster mixing 
parameter 7 as the number of network edges between the treatment cluster and the control cluster, divided by the 
total number of edges in the cluster pair. To ensure that proportion 7 of the edges are shared across clusters, we 
perform degree-preserving rewirin^i^ within each of the C cluster-pairs until proportion 7 edges are shared between 
clusters. We then use a compartmental model to simulate the spread of an infection across each cluster paii^. All 
nodes are either susceptible (S) or infected (/), and nodes may only transition from S to I. The number of neighbors 
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each node can potentially infect at any given time is called its injectivity. We consider both unit and degree infectivity, 
for which infected nodes may contact one or all of their neighbors at a given time, respectively. Treated and control 
clusters infect their neighbors with equal probability under the null hypothesis, and infected individuals in treatment 
clusters infect with reduced probability under the alternative hypothesis. Finally, we analyze the resulting trial 
under two different analysis scenarios, and we juxtapose our findings with a standard power calculation® Table 
summarizes our general simulation algorithm. Next, we discuss each step in more detail. 


1 

Networks: 

Generate C cluster pairs using user-specified random 
networks. 

2 

Mixing: 

Perform degree-preserving rewiring between the two 
clusters in each pair until proportion 7 ties are shared 
across them. 

3 

Spreading: 

Simulate a spreading process according to a suitable 
compartmental model. 

4 

Analysis: 

Assess the empirical power of the simulation using 
the outcomes from the spreading process. 


Table 1: Our simulation algorithm used to assess the effect of within-cluster structure, between-cluster mixing and 
infectivity on statistical power. 


Networks. Infectious disease dynamics have been studied extensively using deterministic ordinary differential 
equation^^ as well as network simulation^. Using networks to simulate the spread of infection allows rich epidemic 
detail, and this added complexity facilitates exploration of the effect of cluster structure on power in CRTs. A brief 
treatment of these features using differential equations is in the supplement (SI). 

A simple network Q consists of a set of n nodes (individuals) and a set of binary pairwise edges (relationships) 
between the nodes. This structure can be compactly expressed by a symmetric adjacency matrix A„xn- If an edge 
exists between individuals i and j then Aij = Aji = 1, and 0 otherwise. The degree of node i, denoted by ki, is the 
number of edges connecting node i to other nodes in the network. Networks can be used to describe complex systems 
like social communities, the structure of metabolic pathways, and the World Wide Web; many reviews of this work 
are availabld^^ESllIllIl 

A random network ensemble is a collection of all possible networks specified either by a probability model or a mech¬ 
anistic modell^. The simplest and most studied random network is the Erdos-Renyi (ER) modeP^, which assumes 
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that each potential edge between any pair of nodes in a network occurs independently with unit probability. Nodes 
in an ER network tend to have degrees close to their shared expected value, while in real-world social and contact 
networks, the distribution of node degrees is typically heavy-tailed: a few nodes are very highly connected (“hubs”), 
but most have small degree. To capture degree heterogeneity, we also simulate networks from the Barabasi-Albert 
(BA) modell^^E^. These networks are generated beginning with a small group of connected nodes and successively 
adding nodes one at a time, connecting them to the nodes in the existing network with probability proportional 
to the degree of each existing node. This mechanism has been shown to yield a power-law degree distribution;^^ 
P{k) ~ with a = 3. This distribution is heavy-tailed, so the probability that some individuals are highly 
connected is more likely than in other network models like the ER. While it can be difficult to assess whether an 
observed network has a power-law degree distribution!^, the BA model comes closer to capturing the heavy-tailed 
degree distributions observed in social networks than the ER model. Another hallmark of real-world social networks 
is that individuals tend to cluster together into communities, or groups of individuals who share more edges with 
each other than between them!^. We use stochastic blockmodels (SBMs)!^^ to model within-cluster communities by 
assuming that each node is a member of a one block in a partition of blocks B comprising all nodes in the network, 
and that the probability of an edge between two nodes depends only on block membership (see supplementary ma¬ 
terial S3 for additional details). Other popular families of random networks include Exponential Random Graphs 
(ERGMs)!^and Small-World network of Watts and Strogatz, among other^^. We leave their implications for CRTs 
for future research. Network instances generated using Python’s networkx library. Each node within each cluster 
has the same expected number of edges (k) = 4. For Figures and we chose C = 20 and n = 300, because for 
7 = 0 these parameters yield empirical power within 0.8 — 0.9, which is a typical range used in cluster randomized 
trials. 

Network mixing. In each cluster pair, one cluster is randomly assigned to treatment and the other is not. The 
mixing parameter 7 can be expressed in terms of the entries in the adjacency matrix. A, and the treatment assignment 
of clusters: 


(1 - 

2m ^ G)- 


( 1 ) 

( 2 ) 


Here, m is the total number of edges in the study, = 1 if node i is in the treatment arm and = 0 otherwise, and 
d{a,b) is equal to 1 when a = b and 0 otherwise. This definition of between-cluster mixing is closely related to the 
concept of modularity, used extensively in network community detection (see supplementary material S2). If 7 = 0, 
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the two clusters share no edges with each other. If 7 = 1/2, there are as many edges reaching across two clusters as 
exist within them. Finally, if 7 = 1, edges are only found between clusters, and the cluster pair network is said to 
be bipartite. A schematic of network mixing is shown in Figure]^ 



Figure 2: A diagram showing two clusters with various proportions of mixing. 


Network rewiring. We first simulate two random networks from the same network model and with the same 
number of edges, each corresponding to a cluster in a pair of clusters. Then, we randomly select one edge from each 
cluster in the pair and remove these two edges. Finally we create two new edges among the four nodes such that the 
two edges reach across the cluster pair. This process is called degree-preserving rewirincf^ because it preserves the 
degrees of all the nodes involved. The process is depicted in Figurej^ We repeat the rewiring process until proportion 
7 of the total edges are rewired. The result is a single cluster pair in our simulated CRT, and the pair-generating 
process is repeated until we have generated our target number of cluster pairs. 
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Figure 3: Degree-preserving rewiring is performed by selecting an edge within each cluster, and swapping them to 
reach across the cluster pair. The dashed gray lines represent another way the edges could have been rewired while 
still preserving degree; either rewiring is chosen with equal probability. 


Infectious spread. Compartmental models assume that each node in a population is in one of a few possible 
states, or compartments, and that individuals switch between these compartments according to some rules. Although 
more realistic models include more state^^, we will assume for simplicity that nodes are in only one of two states: 
uninfected but susceptible (S'), and infected and contagious (/). We assume that the network structure of each 
cluster pair represents the possible transmission paths from infected nodes to susceptible ones. 

Let lirct represent the infectious status for node i in treatment arm r = {0,1} and cluster pair c = 1,..., C at discrete 
time t = 1, with lirct = 1 if the node is infected and 0 otherwise. We define r = 0 if node i is in the control 

arm, and r = 1 if i is in the treatment arm. Let Irct ■= {hrct) represent the proportion of infected nodes in cluster 
pair c at discrete time t. At the beginning of the study, 1% of individuals in each cluster is infected, i.e. Irco = 0.01. 
For each time step t, each node i selects qt network neighbors at random, and infects each one with probability pi. 
Because different infectious diseases have different infectivity behavior, we study both unit and degree infectivity, 
OT Qi = 1 and Qi = ki, respectively. We assume that the infection probability depends only on the treatment arm 
membership of each node r^, thus pi = p^. Treatment reduces the probability p^ of infection. If two clusters in 
a pair have the same infection rate, the treatment has no effect and Pr- = p. This is the null hypothesis under 
examination in our hypothetical study. When we simulate trials under the null hypothesis we set p = 0.30 in every 
cluster. The alternative hypothesis holds if the treatment succeeds in reducing the infection rate, pi < po- When we 
simulate under the alternative hypothesis, po = 0.30 and pi = 0.25. The trial ends when the cumulative incidence of 
infection grows to 10% of the population, i.e., when the cluster pair infection rate (Ai-cTc) = 0.1 for some time T^. 


Analysis. At the end of the simulation, we test whether the treatment was effective by comparing the number of 







infections between treated and control clusters according to two analysis scenarios. In real-world CRTs, the most 
efficient and robust way to compare the two groups depends on what information about the infection can feasibly 
be gathered from the trial. In some trials, surveying the infectious status of individuals is difficult, and therefore 
this information is only available for the beginning and end time points of the trial. In others, the times to infection 
for each node are available. In addition to what information is available, the researcher must choose a statistical 
test according to which assumptions they find suitable to their study. A model-hased test assumes that the data 
are generated according to a particular model, which can be more powerful than other tests if the model is truel^. 
Alternatively, a permutation tes^^ does not make any assumptions about how the data were generated. To show 
how to conduct an analysis suited to different scenarios based on available data, we analyzed our simulated trial using 
two different sets of assumptions. In Scenario 1, we assume that outcomes are only known at the end of the trial, 
and perform a model-based test. In Scenario 2, we assume that the time to each infection is known, and perform a 
permutation test. We show that the results of the simulation are qualitatively similar under both scenarios. (Note 
that it is possible to use a permutation test for Scenario I or a model-based test for Scenario 2, which would create 
two new analyses.) 

Scenario 1: The log risk ratio is the logarithmic ratio of infected individuals in the treatment clusters to the control 
clusters at the end of study. For simulation m, let Im'^ := (log 7 ^^) = (log/ocT,, — log/icTc) be the difference in the 
number of infections between two clusters in a pair averaged over each of the C cluster pairs at the trial end T^. The 
simulation was repeated 20,000 times under the null hypothesis and cutoff values 5 and /gy 5 were established such 
that P{l2.5 < < I97.5) = ct for significance level a = .05. We repeated this process under the alternative 20,000 

times, and the proportion of these trials with statistics more extreme than (^2 51^97 5 ) is the simulated power 
or empirical power. 

Scenario 2: We pool the individual infection times for the treatment arm and the control arm, and summarize the 
difference between the two arms’ infection times using an appropriate statistic (e.g. the logrank statistic!^. The 
permutation test is performed by comparing the observed logrank statistic to the distribution of log-rank statistics 
when the treatment labels are permuted, or switched, for each cluster pair. The p-value for this analysis is the 
proportion of times the log-rank statistic with the real labels is more extreme than the permuted log-rank statistics. 
Because the permutation test is computationally expensive, this entire process is repeated 2,000 times, and we 
calculate the proportion of permutation p-values below 0.05, which is the empirical or simulated power. 

We also compared this formulation to traditional methods, for which we take the formulas in Hayes and Bennett^^to 
be representative. In this calculation, power is a function of number and size of clusters, the expected log risk ratio, 
and the expected average pairwise correlation of outcomes within each cluster (ICC). The value from the ICC must 
be assumed beforehand or estimated in a small pilot study. To compare this approach with our simulation design, 
we assumed that the ICC took on a range of plausible empirical values 0.003 — 0.06 reported in the literaturel^. For 
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more details, see supplementary material S4. 


Results 

We begin by showing the effect of the mixing parameter 7 on the infection risk ratios between treated and untreated 
clusters. The means and standard deviations of simulated risk ratios observed under Scenario 1 are presented in 
Figure]^ 
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Figure 4: The log risk ratio means and standard deviations under Scenario 1. The rows correspond to the means 
(Panels a and b) and standard deviations (Panels c and d), shown on the y axis. The x-axis is the value of the 
mixing parameter 7 , and each curve represents the three within-cluster network structures. The left column shows 
the spread of an infection in which an infected node may only infect one neighbor per time step (unit infectivity), 
whereas the right column assumes one may spread an infection to each of their neighbors (degree infectivity). We 
see that network topology has an effect on the variation of the log rate ratio only in the latter case. 


For both kinds of infectivity, neither the heavy-tailed degree distribution of the BA network nor the within-cluster 
community structure of the SBM network dramatically impacts the differences between the proportion of infections 
in the treated and controlled clusters in each pair (top row) compared to the ER network. The differences between 
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the risk of infections in the treated and untreated cluster pairs decreases as mixing increases, and reverses direction 
when 7 > 1/2. This is expected because for this range of between-cluster mixing, infected individuals in the treatment 
cluster are more likely to contact members of the untreated cluster and vice versa, which is unlikely in practice but 
is included here for completeness. In almost all cases, the variation in the simulated studies’ average log risk ratio 
decreases uniformly as 7 increases, which suggests that increasing the amount of mixing across communities results 
in less variation in the average rate of infections. However, the BA network is an exception. Under degree infectivity, 
when individuals can infect everyone to whom they are connected in a single time step, an infected node with large 
degree may spread its infection to each of its contacts at a single time point, which can cause a very fast outbreak. 
However, highly-connected individuals are rare, so in this case outbreaks are large but infrequent, increasing the 
variation in observed differences between treated and untreated clusters. This variation means that more clusters are 
required to estimate the average treatment effect with any precision. In other words, rare outbreaks make it harder 
to distinguish whether differences between the treatment arm and control arm are due to treatment or to a chance 
outbreak occurring in either arm. Therefore, under degree infectivity, the BA network results in less power than the 
SBM or ER networks, which shows that within-cluster network structure can impact the power to detect treatment 
effects in CRTs for certain kinds of infections. 

For the two analysis scenarios described in Methods, we can directly estimate empirical power as the proportion of 
simulations resulting in the rejection of the null hypothesis at the a = 0.05 level under the alternative for a range of 
mixing values 7 . Our results, as well as a comparison with the standard approach, are summarized in Figure 
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Figure 5: Estimated power for each scenario. The blue, red, and green lines represent the ER, BA, and SBM network 
models, respectively. The top row shows results for Scenario 1, and the bottom row shows results for Scenario 2. The 
left column shows unit infectivity, and the right column shows degree infectivity. The horizontal gray bars represent 
the expected power using the standard approach for a range of plausible values for the ICC. 


In all settings, power is lowest when 7 « 1/2, with approximately the same number of edges between clusters 
as within them. Scenarios I and 2 (the top and bottom rows, respectively) show few differences from one another, 
which suggests that the two strategies for significance testing tend to give qualitatively similar results. Unit infectivity 
(lefthand column) shows no differences in power among network types. This is not the case for degree infectivity 
(righthand column), in which the BA network shows less power than the other networks, for the reasons discussed 
above. Finally, the gray bars indicate that when no mixing is present, standard power calculations are conservative for 
all network types we studied, and no sample size adjustment may be needed. However, moderate to severe between- 
cluster mixing can greatly overestimate expected power. In the case of the BA network and degree infectivity, the 
standard approach always overestimates trial power. 

Size and number of study clusters. Our results so far have shown how power in CRTs is affected by between- 
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cluster mixing, within-cluster structure, and infectivity. Next, we show how power relates to other trial features, 
namely the size and number of clusters, n and C, respectively. The results are qualitatively similar for Scenarios 1 
and 2, and the results shown in Table are for Scenario 1. The table shows results for each combination of a range 
of cluster sizes n = {100, 300,1000} and numbers C = {5,10, 20} as a 3 x 3 grid of pairs of cells. Each cell pair is a 
side-by-side comparison of results for unit infectivity (lefthand cell) and degree infectivity (righthand cell). Each cell 
shows simulated results for within-cluster structure (columns) as well as amount of between-cluster mixing (rows). 
Considering the case of C = 10,n = 300 (the middle-most cell pair), we notice a few trends. We see that increasing 
mixing (looking down each column) decreases power in all cases. We can directly compare the two types of infectivity 
(comparing cells in the pair), and see that all the entries are similar except for the BA network (middle column). For 
BA networks, power is much lower for degree infectivity spreading compared to unit infectivity. This suggests that 
CRTs with network structure similar to BA networks can have substantially less power when the infection spreads 
in proportion to how connected each node is. Finally, we may compare studies of differing cluster numbers and 
sizes (comparing cell pairs), and see qualitatively similar results: in each case, more or larger clusters in the study 
(cell pairs further down or right) result in more power overall. When power is very high (bottom-right cell pair), 
within-cluster structure affects results less. Therefore, careful consideration of expected power is most important 
when trial resources are limited, which is often the case in practice. 


n = 100 


n = 300 


n = 1000 


C = 5 C = 10 C = 20 


Unit Degree Unit Degree Unit Degree 


0.13 

0.14 

0.14 

0.12 

0.10 

0.12 

0.22 

0.22 

0.23 

0.19 

0.16 

0.20 

0.39 

0.39 

0.38 

0.34 

0.28 

0.37 

0.10 

0.10 

0.10 

0.11 

0.09 

0.10 

0.15 

0.17 

0.15 

0.18 

0.12 

0.16 

0.27 

0.27 

0.24 

0.26 

0.21 

0.26 

0.08 

0.08 

0.07 

0.09 

0.07 

0.09 

0.09 

0.13 

0.11 

0.11 

0.11 

0.13 

0.15 

0.15 

0.15 

0.22 

0.15 

0.18 

0.07 

0.06 

0.06 

0.07 

0.06 

0.07 

0.08 

0.09 

0.06 

0.08 

0.08 

0.08 

0.10 

0.09 

0.10 

0.11 

0.11 

0.11 

0.34 

0.33 

0.32 

0.33 

0.20 

0.33 

0.57 

0.55 

0.57 

0.59 

0.34 

0.56 

0.85 

0.86 

0.86 

0.87 

0.57 

0.87 

0.21 

0.22 

0.21 

0.25 

0.13 

0.27 

0.39 

0.39 

0.39 

0.46 

0.28 

0.46 

0.63 

0.65 

0.65 

0.71 

0.44 

0.70 

0.16 

0.15 

0.15 

0.17 

0.14 

0.19 

0.23 

0.22 

0.22 

0.28 

0.19 

0.30 

0.41 

0.43 

0.40 

0.52 

0.38 

0.44 

0.08 

0.08 

0.08 

0.12 

0.10 

0.10 

0.13 

0.14 

0.13 

0.14 

0.13 

0.17 

0.21 

0.21 

0.19 

0.28 

0.24 

0.27 

0.78 

0.80 

0.76 

0.84 

0.39 

0.81 

0.97 

0.97 

0.97 

0.98 

0.75 

0.98 

1.00 

1.00 

1.00 

1.00 

0.95 

1.00 

0.61 

0.57 

0.59 

0.69 

0.38 

0.67 

0.85 

0.86 

0.85 

0.93 

0.61 

0.91 

0.99 

0.99 

0.99 

1.00 

0.91 

1.00 

0.39 

0.37 

0.36 

0.47 

0.31 

0.51 

0.62 

0.60 

0.59 

0.76 

0.52 

0.76 

0.89 

0.90 

0.87 

0.97 

0.83 

0.96 

0.15 

0.19 

0.16 

0.30 

0.21 

0.28 

0.31 

0.33 

0.36 

0.49 

0.36 

0.45 

0.58 

0.56 

0.57 

0.74 

0.62 

0.73 


ER BA SBM 


0.0 

0.1 

0.2 

0.3 

0.0 

0.1 

0.2 

0.3 

0.0 

0.1 

0.2 

0.3 


Table 2: Experimental power in our simulation framework for different sizes and numbers of cluster pairs, n and C, 
respectively, for Scenario 1. Each cell shows output for 3,000 simulations of each combination of n and C, all three 
within-cluster structures, various values of mixing parameter 7 , and both unit and degree infectivity. The results are 
similar for Scenario 2. 


Real-world data and the extent of mixing. Finally, we show how our mixing parameter can be estimated using 
data in the planning stages of a hypothetical CRT. Sometimes the entire network structure between individuals 
in a prospective trial is known beforehand, such as the sexual contact network on Likoma Islandl^. In this case, 
between-cluster mixing can be estimated using Equation [j In other trials, perhaps only partial information is known, 
like the degree distribution^^ and/or the proportion of ties between clusters. In this case, clusters can be generated 
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that preserve partial network information such as degree distributionl^^Hni^ and degree-preserving rewiring can be 
performed until proportion 7 of ties between clusters is observed, where this quantity is estimated from the network 
data, if possible. 

The structure of calls between cell phones is often persistent over timeP^and indicative of actual social relationship^^. 
We use a network of cell phone calls as a proxy for a contact network, we use our definition of between-cluster mixing 
to estimate the amount of mixing between hypothetical clusters. The dataset consists of all the calls made between 
cellphones of a large mobile carrier within a quarter year. Individual phone numbers were anonymized, and we only 
report results for the number of individuals and calls within or between zip codes. 


The dataset contains phone calls originating from Z = 3806 different zip codes, and we define a cluster as a collection 
of zip codes that are spatially close to one another. Because zip codes are numerically assigned according to spatial 
location, we assume that zip codes that are numerically contiguous to each other are also close to each other spatially. 
Therefore, zip code z = 1 ,..., Z assigned to cluster Cz = 1,..., 2C is 


Cz := 



( 3 ) 


where 2C is the total number of clusters in the trial, and [•] is the ceiling function. Once the number of clusters 2C 
is specified, clusters may be paired, with one cluster in each pair randomized to a hypothetical treatment, and the 
other to the control condition. 


Next, we estimate mixing parameter 7 for this dataset. We consider two definitions for the number of edges shared 
between individuals, one in which they are unweighted and one in which they are weighted by the number of calls 
between them. We consider two definitions for an edge Aij between individuals i and j, belonging to clusters Ci 
and Cj respectively. The number of calls between i and j over the period of investigation is defined as dij. For 
Definition 1, we assume and edge exists between the two individuals if they have called each other at least once, 
Aij = I{dij > 1), and otherwise no edge exists between them Aij = 0. For Definition 2, we assume an edge between 
them may be weighted by the number of total calls made between them, Aij = dij. Using both definitions, we 
found the degree distribution of each cell phone to be heavy-tailed (see supplementary material S5). For a range of 
numbers of cluster pairs C, we cluster all Z zip codes into 2C clusters, and randomize one cluster in each pair to a 
hypothetical treatment, and the other to a control. For 200 randomizations, we calculate the between-cluster mixing 
parameter 7 using Equation . We examine the relationship between 7 and the number of clusters C. The mean and 
(2.5,97.5) percentiles of these estimates as a function of the number of clusters number C are shown in Figure]^ 
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Figure 6 : A log-linear plot displaying empirical values of mixing parameter 7 . The y axis shows the mean and 
(2.5,97.5) quantiles of these estimates. The x axis in each panel corresponds to a range of cluster numbers C. 


Figure displays a number of distinct trends. As the number of clusters increases, fewer of the total zip codes are 
included in each cluster, and the number of calls between clusters increases. This means that individuals are more 
likely to call others in zip codes geographically closer to them, which has been confirmed in other phone communi¬ 
cation networksl^. Between-cluster mixing unweighted by the number of calls (blue) results in higher estimates of 
7 than weighted (red), which means that when individuals call others outside their cluster, they tend to call those 
people less than others they call within their cluster. There is significant between-cluster mixing for all values of 
C, implying that between-cluster mixing would significantly decrease the power of a trial that assumes each cluster 
to be independent (7 = 0). Furthermore, as the number of clusters increases, the average cluster size decreases, 
and mixing reaches a maximum of 7 = 0.45. Extrapolating from our simulation framework, power could be reduced 
dramatically in this case. 

Discussion 

Before conducting a trial, it is important to have an estimate of statistical power in order to assess the risks of failing 
to find true effects and of spurious results. If individuals belong to interrelated clusters, randomly assigning them 
to treatment or control may not be a palatable option, and CRTs can be used to test for treatment effects. Power 
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in CRTs is known to depend on the number and size of clusters, as well as the amount of correlation within each 
cluster. However, within-cluster correlation structure is often measured by a single number and clusters are usually 
assumed to be independent of one another. Unfortunately, these assumptions can produce misleading estimates of 
power. 

To investigate this problem, we studied the effects of complex within-cluster structure, a measure of between-cluster 
mixing strength, and infectivity on power by simulating a matched-pairs CRT for an infectious process. We simulated 
a collection of cluster pairs as a network, controlling the proportion of edges shared across each pair. We then 
simulated an SI infectious process on each cluster pair, with one cluster assigned to treatment and the other assigned 
to control. The effect of treatment in this simulation lowered the probability that an infected individual succeeds at 
infecting a susceptible neighbor. We also considered two types of infectivity: unit and degree. 

We found that between-cluster mixing had a profound effect on statistical power, no matter what network or infectious 
process was simulated. As the number of edges shared across clusters in different treatment groups increased to 1 / 2 , 
on average the two clusters were nearly indistinguishable, and thus power fell to nearly zero. This is not surprising, 
but most power calculations assume clusters are independent, and this issue is usually left unaddressed. We compared 
these hndings to the ICC approach, and found it will significantly underestimate expected power if the extent of 
between-cluster mixing is moderate to severe. 

The effect of within-cluster structure was more nuanced. For degree infectivity, the spread of infection was less 
predictable if the network contained some highly-connected nodes, due to the variation in and strong effects of these 
hubs becoming infected. We did not observe this level of variability for networks without highly-connected hub 
nodes. We also did not observe this level of variability for unit infectivity, regardless of how many hubs were present 
in the network. Taken together, we found that for the network structures we studied, within-cluster structure had a 
significant impact on power only when the infectious process exhibited degree infectivity. The effect of within-cluster 
structure and between-cluster mixing on statistical power are qualitatively similar for a range of cluster sizes and 
numbers, although (as is well known) an increase in either results in more power overall. 

Our simulation framework can be used to estimate power before an actual trial. If partial or full network information 
is available, it can be used to simulate an infectious processes using a compartmental model, and analyze the resulting 
outcomes as we have described. We demonstrated how to estimate between-cluster mixing using a dataset composed 
of cellphone calls from a large mobile carrier, which are taken to represent a contact network. For a hypothetical 
prospective trial on the individuals in this dataset, we defined a cluster as a group of individuals within a collection 
of contiguous zip codes. We then grouped clusters into pairs, randomly assigned one cluster in each pair to a 
hypothetical treatment condition and the other to a control, and estimated mixing parameter 7 for each simulation. 
We found substantial between-cluster mixing for all choices of cluster numbers, and mixing increased when clusters 
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were chosen to be more numerous but smaller. Estimates of between-cluster mixing ranged from moderate to severe, 
regardless of whether the estimation adjusted for the frequency of calls or not. 

Our study invites several investigations and extensions. First, we have employed restrictively simple network mod¬ 
els and infectious spreading process, and more nuanced generalizations are available. While our work shows how 
infectious spreading and complex structure can affect expected results in CRTs, more specific circumstances require 
extensions with more tailored network designs and infection types for power to be properly estimated. Second, we 
have focused our attention on matched-pair CRTs, and our framework should be extended to other CRT designs 
used in practice!^. Third, these findings should be replicated in data for which both network structure and infectious 
spread are available. 
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Supplementary Material 


In this supplement, we provide additional details for a few topics discussed in the main paper. Section SI demonstrates 
a simple approach to modeling infectious spread with between-cluster mixing using ordinary differential equations, 
and compares this result to the simulation approach introduced in the paper. Section S2 describes the stochastic 
blockmodel and provides details for the specific model we used in our paper. Section S3 connects our definition of 
between-mixing parameter 7 with a common metric used in applications of network science. Section S4 describes 
how the Intracluster Correlation Coefficient is defined, and we show estimates of this quantity for our simulations. 
Finally, Section S5 shows the degree distribution for the empirical cell phone network, with discussion. 


SI: Ordinary Differential Equation approach to epidemic spreading with 
between-cluster mixing. 


One of the most common approaches to investigating the spread of an epidemic on networks is Ordinary Differential 
Equations (ODEs)^. ODEs are functions of a variable in terms of its derivatives. Compartmental models for 
epidemic spread can use ODEs to specify the rate of change for individuals in terms of others. A common assumption 
used to specify ODEs for epidemic spread is mass action^ in which the spread of an infection depends only on the 
proportion of individuals in each compartment. For example, an SI compartmental model assumes that individual 
i is either infected {Ii{t) = 1) or not infected but susceptible {Si[t) = I) at any time t. These two statuses are 
mutually exclusive, and Siit) = \ An ordinary differential equation that assumes mass action would specify 

the change in the total proportion of infected individuals I{t) := {Ii(t)) in terms of the infected proportion I{t) at 
time t. If we assume mass action, we may model the rate of infectious growth in an SI compartmental model as 
proportional to the proportion of infected individuals multiplied by the proportion of susceptible individuals: 


^=psit)i{t)=p{i-iit))nt) 


( 4 ) 


In this paper, we consider a collection of c = I,...,C cluster pairs, with one cluster in each pair assigned to the 
treatment condition r = 1 and the other to control r = 0. Furthermore, we assume that clusters are mixed according 
to mixing parameter 7 , For the SI compartmental model, Iirc(t) = 1 if individual i is infected and 0 otherwise. We 
may assume that the spread of an infection across the network pair is a mass action ODE as above, with a simple 
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modification. Let Irc(t) = (lircit)) represent the proportion of infected nodes in cluster pair c at discrete time t. 
Individual i may contact an individual j in the opposing cluster with probability 7 . In this case, the probability of a 
successful infection requires that i is suspectible and j is infectious. Mass action dictates that the rate of change for 
each cluster depends only on the proportion of individuals in each infectious status for either cluster, which is now 
sum of ODEs weighted by mixing parameter 7 : 

= [(1 “ l)hc{t)po + lhc{t)pi] (1 - Ioc{t)) (5) 

= [(1 “ l)hc{t)pi + -fhc{t)po\ (1 - licit)) ( 6 ) 

According to Supplementary Equations and if 7 = 0, the rate of infection in each cluster is identical to 
Supplementary Equation]^ As 7 approaches 1/2, the difference in the proportion of infected individuals in the two 
treatment arms decreases to no difference. 

The ODE approach is quite comparable to the stochastic approach we chose for the paper. To show this, we 
created network clusters with every node connected to each other in the cluster, performed degree-corrected rewiring, 
simulated an infectious processes with unit infectivity on the pair according to the paper, and averaged the proportion 
of infections at each time step. Supplementary Figure shows the infection rates over time for a range of mixing 
values 7 = {0.0,0.1, 0.2,1}. The solid lines shows the average of the network simulations. The dashed lines show 
the a numerical solution to Supplementary Equations and The two are comparable, suggesting that differential 
equations and network simulations can approximately interchangeably describe the same infectious process. 
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Supplementary Figure 1: The proportion of infections over time. The solid line is the mass action rate equation, and 
the dashed lines are the mean of simulations of an infectious process on a complete (fully-connected) network. The 
infectious process was simulated for 7 = {0.0, 0.1,0.2,1}, matching Figure 5. As 7 approaches 1/2, the difference in 
infection rates in two clusters in a pair decreases, demonstrated by the red and blue curves approaching each other. 
When 7 = 1 , the relative rates of infections switch. 


Where the differential equation approach assumes individuals contact everyone in the population, infections spreading 
through fixed networks only allow contact through existing edges. This redundant contact effect causes infections 
through networks to be slightly slower, also observable in Supplementary Figure 


S2: Modularity and Between-Mixing Parameter 7 

Our definition of between-mixing parameter 7 (Equation 2) has a convenient interpretation in terms of findings in 
network science. Modularity Q is a measure of how well the individuals in a network and their relationships fit into 
mutually exclusive groups®. For CRTs, we assume the natural groupings to be the two treatment arms. If Q = 1, 
all edges exist within treatment arms. If Q = —1, all edges are between the two treatment arms. The definition of 
modularity is written in the same terms as 7 : 
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If the individuals between the two treatment arms have equal numbers of edges, ( 2 m)^ 

7 = 1/2 — Q. Therefore, if modularity can be computed, so can the mixing between the two treatment arms. More 
generally, 7 is entirely a function of cluster structure matrix A and treatment assignments, so if an experimenter 
knows the structure of relationships among individuals in the study, they may calculate the estimate the amount of 
mixing between the two treatment arms. 


S3: Details on the Stochastic Blockmodel 

A stochastic blockmodel (SBM) is a probablistic network model, which means that the probability of an edge existing 
between nodes i and j is specified by probability pij. SBM assumes that each network node is a member of a exactly 
one block in a partition of b blocks B — Bi ,..., Bb, and the probability pij of a connection between nodes i and j 
depends only on each node’s block membership. Denote the block membership of node i as Bi. A probability matrix 
Pftxb describes all edge probabilities for a network, with = PBi,Bj- 

In our study, we imitated within-cluster community structure using a SBM. We assume each cluster is comprised of 
blocks arranged in a triangular lattice structure. Blocks of nodes may be thought of near each other in geographic 
location, and while most edges are contained within each block, blocks share a few edges according to a triangular 
spatial pattern. We organized clusters into 10 equally-sized blocks, and individuals within each block are connected 
to others within their block such that average within-block degree is ^(k). For between-block connections, we also 
assume that each edge between members of blocks share a total between-block degree of -^(k) with adjacent blocks 
according to the lattice structure, and no edges with all other blocks. A diagram of this network ensemble is shown 
in Supplementary Figure 
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Supplementary Figure 2: 10 communities or blocks within clusters were created according to the stochastic block- 
model, with a small probability of community ties in a triangular lattice. Edge probabilities were selected to preserve 
the average degree of a random network. 


S4: The ICC 

The Intracluster Correlation Coefficient (ICC) is a measure of the average correlation between individual outcomes 
within a cluster. The ICC assumes that the correlation is identical for all pairs of individuals within a cluster, and is 
constant across clusters. The ICC can also be expressed as the ratio of between-cluster variance to the total outcome 
variance in the stud}^. In the case of binary outcomes, this value may be expressed as^^ 


TCC = (^c(l - T^c)) 

K)(l-(^c)) 


( 8 ) 


where tTc is the proportion of infections in cluster c and (•) is the average over all clusters in a trial. We calculated the 
ICC this value for each network ensemble and value of 7 in our simulations. These results are shown in Supplementary 
Figure 
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Supplementary Figure 3: ICCs from Scenario 1, averaged over all simulations. ICC values are shown for unit 
infectivity (Panel a) and degree infectivity (Panel b), as well as each within-cluster structure and extent of between- 
cluster mixing specified in our simulations. 


These values are quite low, but not very far from typical valued and lower values have been reported in actual 
trialsl^. These values for the ICC are low because in our design, the data is collected for each cluster pair when 
the average proportion of infections within each pair is 10%, which results in relatively low variation in infection 
proportions for each cluster. 

Like power, the relative value of the ICC depends on within-cluster structure, the amount of between-cluster mixing, 
and infectivity. In the case of unit infectivity, the ICC shrinks as between-cluster mixing increases for all within- 
cluster structures. However, in many power calculation formula^, lower values of ICC indicate increased power, not 
less. This shows that even if sample size calculations account for within-cluster correlations as measured by the ICC, 
power can be reduced by other trial features, such as the extent of between-cluster mixing. 


S5: Degree Distribution for an Empirical Cell Phone Network 


The main paper specifies two definitions for an edge between callers in the cell phone network, which are, respectively, 
unweighted or weighted by the number of total number of calls made between each pair of callers. The empirical 
degree distribution for both definitions are found in Supplementary Figure]^ 
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Supplementary Figure 4: The empirical degree distribution for the calling network dataset. Panel a corresponds to 
Definition 1 (unweighted), and Panel b corresponds to Definition 2 (weighted). 


Focusing on Panel a, we notice three distinct regimes. The vast majority of callers make calls with 1 — 100 others. 
The distribution of those who call a large number (100 — 1000) of others follows a nearly straight line on these log-log 
plots, which is indicative of a power-law for this segment. Finally, a few singular callers are found to call a very large 
number (> 1000) of callers within the quarter. The general shape is similar for both the unweighted and weighted 
definitions. This degree distribution is in accordance to similar datasets analyzed in the literaturel^. 
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